17 research outputs found

    Human-aware application of data science techniques

    Get PDF
    In recent years there has been an increase in the use of artificial intelligence and other data-based techniques to automate decision-making in companies, and discover new knowledge in research. In many cases, all this has been performed using very complex algorithms (so-called black-box algorithms), which are capable of detecting very complex patterns, but unfortunately remain nearly uninterpretable. Recently, many researchers and regulatory institutions have begun to raise awareness of their use. On the one hand, the subjects who depend on these decisions are increasingly questioning their use, as they may be victims of biases or erroneous predictions. On the other hand, companies and institutions that use these algorithms want to understand what their algorithm does, extract new knowledge, and prevent errors and improve their predictions in general. All this has meant that researchers have started to focus on the interpretability of their algorithms (for example, through explainable algorithms), and regulatory institutions have started to regulate the use of the data to ensure ethical aspects such as accountability or fairness. This thesis brings together three data science projects in which black-box predictive machine learning has been implemented to make predictions: - The development of an NTL detection system for an international utility company from Spain (Naturgy). We combine a black-box algorithm and an explanatory algorithm to guarantee our system's accuracy, transparency, and robustness. Moreover, we focus our efforts on empowering the stakeholder to play an active role in the model training process. - A collaboration with the University of Padova to provide explainability to a Deep Learning-based KPI system currently implemented by the MyInvenio company. - A collaboration between the author of the thesis and the Universitat de Barcelona to implement an AI solution (a black-box algorithm combined with an explanatory algorithm) to a social science problem. The unique characteristics of each project allow us to offer in this thesis a comprehensive analysis of the challenges and problems that exist in order to achieve a fair, transparent, unbiased and generalizable use of data in a data science project. With the feedback arising from the research carried out to provide satisfactory solutions to these three projects, we aim to: - Understand the reasons why a prediction model can be regarded as unfair or untruthful, making the model not generalisable, and the consequences from a technical point of view in terms of low accuracy of the model, but also how this can affect us as a society. - Determine and correct (or at least mitigate) the situations that cause the problems in terms of robustness and fairness of our data. - Assess the difference between the interpretable algorithms and black-box algorithms. Also, evaluate how well the explanatory algorithms can explain the predictions made by the predictive algorithms. - Highlight what the stakeholder's role in guaranteeing a robust model is and how to convert a data-driven approach to solve a predictive problem into a data-informed approach, where the data patterns and the human knowledge are combined to maximize profit.En els últims anys s'ha produït un augment de l'ús de la intel·ligència artificial i altres tècniques basades en dades per automatitzar la presa de decisions en les empreses, i descobrir nous coneixements en la recerca. En molts casos, tot això s'ha realitzat utilitzant algorismes molt complexos (anomenats algorismes de caixa negra), que són capaços de detectar patrons molt complexos, però, per desgràcia, continuen sent gairebé ininterpretables. Recentment, molts investigadors i institucions reguladores han començat a conscienciar sobre el seu ús. D'una banda, els subjectes que depenen d'aquestes decisions estan qüestionant cada vegada més el seu ús, ja que poden ser víctimes de prejudicis o prediccions errònies. D'altra banda, les empreses i institucions que utilitzen aquests algoritmes volen entendre el que fa el seu algorisme, extreure nous coneixements i prevenir errors i millorar les seves prediccions en general. Tot això ha fet que els investigadors hagin començat a centrar-se en la interpretació dels seus algorismes (per exemple, mitjançant algorismes explicables), i les institucions reguladores han començat a regular l'ús de les dades per garantir aspectes ètics com la rendició de comptes o la justícia. Aquesta tesi reuneix tres projectes de ciència de dades en els quals s'ha implementat aprenentatge automàtic amb algorismes de caixa negra per fer prediccions: - El desenvolupament d'un sistema de detecció de NTL (Non-Technical Losses, pèrdues d'energia no tècniques) per a una empresa internacional del sector de l'energia d'Espanya (Naturgy). Aquest sistema combina un algorisme de caixa negra i un algorisme explicatiu per garantir la precisió, la transparència i la robustesa del nostre sistema. A més, centrem els nostres esforços en la capacitació dels treballadors de l'empresa (els "stakeholders") per a exercir un paper actiu en el procés de formació dels models. - Una col·laboració amb la Universitat de Padova per proporcionar l'explicabilitat a un sistema KPI basat en Deep Learning actualment implementat per l'empresa MyInvenio. - Una col·laboració de l'autor de la tesi amb la Universitat de Barcelona per implementar una solució d'AI (un algorisme de caixa negra combinat amb un algorisme explicatiu) a un problema de ciències socials. Les característiques úniques de cada projecte ens permeten oferir en aquesta tesi una anàlisi exhaustiva dels reptes i problemes que existeixen per a aconseguir un ús just, transparent, imparcial i generalitzable de les dades en un projecte de ciència de dades. Amb el feedback obtingut de la recerca realitzada per a oferir solucions satisfactòries a aquests tres projectes, el nostre objectiu és: - Entendre les raons per les quals un model de predicció pot considerar-se injust o poc fiable, fent que el model no sigui generalitzable, i les conseqüències des d'un punt de vista tècnic en termes de baixa precisió del model, però també com pot afectar-nos com a societat. - Determinar i corregir (o almenys mitigar) les situacions que causen els problemes en termes de robustesa i imparcialitat de les nostres dades. - Avaluar la diferència entre els algorismes interpretables i els algorismes de caixa negra. A més, avaluar com els algorismes explicatius poden explicar les prediccions fetes pels algorismes predictius. - Ressaltar el paper de les parts interessades ("Stakeholders") per a garantir un model robust i com convertir un enfocament únicament basat en les dades per resoldre un problema predictiu en un enfocament basat en les dades però complementat amb altres coneixements, on els patrons de dades i el coneixement humà es combinen per maximitzar els beneficis.Postprint (published version

    Collaborative Filtering Ensemble for Personalized Name Recommendation

    Get PDF
    Projecte final de carrera fet a la Leibniz Universität de Hannover. Fakultät für Elektrotechnik und informatik.Out of thousands of names to choose from, picking the right one for your child is a daunting task. In this thesis, our objective is to help parents make an informed decision while choosing a name for their baby. To this end, we follow a recommender system approach and explore di erent methods for given name recommendation. Our final approach combines, in an ensemble, the individual rankings produced by simple collaborative filtering algorithms in order to produce a personalized list of names that meets the individual parents’ taste

    Bridging the gap between energy consumption and distribution through non-technical loss detection

    Get PDF
    The application of Artificial Intelligence techniques in industry equips companies with new essential tools to improve their principal processes. This is especially true for energy companies, as they have the opportunity, thanks to the modernization of their installations, to exploit a large amount of data with smart algorithms. In this work we explore the possibilities that exist in the implementation of Machine-Learning techniques for the detection of Non-Technical Losses in customers. The analysis is based on the work done in collaboration with an international energy distribution company. We report on how the success in detecting Non-Technical Losses can help the company to better control the energy provided to their customers, avoiding a misuse and hence improving the sustainability of the service that the company provides.Peer ReviewedPostprint (published version

    Non-technical losses detection in energy consumption focusing on energy recovery and explainability

    Get PDF
    Non-technical losses (NTL) is a problem that many utility companies try to solve, often using black-box supervised classifcation algorithms. In general, this approach achieves good results. However, in practice, NTL detection faces technical, economic, and transparency challenges that cannot be easily solved and which compromise the quality and fairness of the predictions. In this work, we contextualise these problems in an NTL detection system built for an international utility company. We explain how we have mitigated them by moving from classifcation into a regression system and introducing explanatory techniques to improve its accuracy and understanding. As we show in this work, the regression approach can be a good option to mitigate these technical problems, and can be adjusted in order to capture the most striking NTL cases. Moreover, explainable AI (through Shapley Values) allows us to both validate the correctness of the regression approach in this context beyond benchmarking, and improve the transparency of our system drastically.Peer ReviewedPostprint (published version

    A human-in-the-loop approach based on explainability to improve NTL detection

    Get PDF
    Implementing systems based on Machine Learning to detect fraud and other Non-Technical Losses (NTL) is challenging: the data available is biased, and the algorithms currently used are black-boxes that cannot be either easily trusted or understood by stakeholders. This work explains our human-in-the-loop approach to mitigate these problems in a real system that uses a supervised model to detect Non-Technical Losses (NTL) for an international utility company from Spain. This approach exploits human knowledge (e.g. from the data scientists or the company's stakeholders) and the information provided by explanatory methods to guide the system during the training process. This simple, efficient method that can be easily implemented in other industrial projects is tested in a real dataset and the results show that the derived prediction model is better in terms of accuracy, interpretability, robustness and flexibility.Peer ReviewedPostprint (author's final draft

    Fraud detection in energy consumption: a supervised approach

    Get PDF
    Data from utility meters (gas, electricity, water) is a rich source of information for distribution companies, beyond billing. In this paper we present a supervised technique, which primarily but not only feeds on meter information, to detect meter anomalies and customer fraudulent behavior (meter tampering). Our system detects anomalous meter readings on the basis of models built using machine learning techniques on past data. Unlike most previous work, it can incrementally incorporate the result of field checks to grow the database of fraud and non-fraud patterns, therefore increasing model precision over time and potentially adapting to emerging fraud patterns. The full system has been developed with a company providing electricity and gas and already used to carry out several field checks, with large improvements in fraud detection over the previous checks which used simpler techniques.Peer ReviewedPostprint (author's final draft

    A case study of improving a non-technical losses detection system through explainability

    Get PDF
    Detecting and reacting to non-technical losses (NTL) is a fundamental activity that energy providers need to face in their daily routines. This is known to be challenging since the phenomenon of NTL is multi-factored, dynamic and extremely contextual, which makes artificial intelligence (AI) and, in particular, machine learning, natural areas to bring effective and tailored solutions. If the human factor is disregarded in the process of detecting NTL, there is a high risk of performance degradation since typical problems like dataset shift and biases cannot be easily identified by an algorithm. This paper presents a case study on incorporating explainable AI (XAI) in a mature NTL detection system that has been in production in the last years both in electricity and gas. The experience shows that incorporating this capability brings interesting improvements to the initial system and especially serves as a common ground where domain experts, data scientists, and business analysts can meet.Peer ReviewedPostprint (published version

    Collaborative Filtering Ensemble for Personalized Name Recommendation

    No full text
    Projecte final de carrera fet a la Leibniz Universität de Hannover. Fakultät für Elektrotechnik und informatik.Out of thousands of names to choose from, picking the right one for your child is a daunting task. In this thesis, our objective is to help parents make an informed decision while choosing a name for their baby. To this end, we follow a recommender system approach and explore di erent methods for given name recommendation. Our final approach combines, in an ensemble, the individual rankings produced by simple collaborative filtering algorithms in order to produce a personalized list of names that meets the individual parents’ taste
    corecore